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Abstract 

Background: The applications of massively parallel sequencing technology to fetal cell-free DNA (cff-DNA) have 
brought new insight to non-invasive prenatal diagnosis. However, most previous research based on maternal 
plasma sequencing has been restricted to fetal aneuploidies. To detect specific parentally inherited mutations, 
invasive approaches to obtain fetal DNA are the current standard in the clinic because of the experimental 
complexity and resource consumption of previously reported non-invasive approaches. 

Methods: Here, we present a simple and effective non-invasive method for accurate fetal genome recovery- 
assisted with parental haplotypes. The parental haplotype were firstly inferred using a combination strategy of trio 
and unrelated individuals. Assisted with the parental haplotype, we then employed a hidden Markov model to 
non-invasively recover the fetal genome through maternal plasma sequencing. 

Results: Using a sequence depth of approximately 44X against a an approximate 5.69% cff-DNA concentration, we 
non-invasively inferred fetal genotype and haplotype under different situations of parental heterozygosity. Our data 
show that 98.57%, 95.37%, and 98.45% of paternal autosome alleles, maternal autosome alleles, and maternal 
chromosome X in the fetal haplotypes, respectively, were recovered accurately. Additionally, we obtained efficient 
coverage or strong linkage of 96.65% of reported Mendelian-disorder genes and 98.90% of complex disease- 
associated markers. 

Conclusions: Our method provides a useful strategy for non-invasive whole fetal genome recovery. 



Genome Medicine 



Background 

Prenatal diagnosis is one of the most efficient approaches 
to decrease the incidence of birth defects [1]. Traditionally, 
fetal cells for prenatal diagnosis are collected invasively by 
the procedures of amniocentesis or chorionic villus sam- 
pling (CVS), but these carry a risk of miscarriage [2,3]. To 
reduce the requirement of invasive testing, non-invasive 
approaches, such as the use of maternal serum markers 
and ultrasound, are widely used in the clinic to classify 
low- and high-risk pregnant women with Down's 
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syndrome fetuses. However, these non-invasive prenatal 
screens are unsatisfactory to many clinicians and pregnant 
women due to their false-positive and potential false-nega- 
tive rates [4-6] . With the discovery of cell-free fetal DNA 
(cff-DNA) in maternal plasma [7-10] and the emergence 
of high-throughput sequencing, the clinical application of 
non-invasive tests to detect fetal chromosomal abnormal- 
ities using maternal plasma sequencing have been dis- 
cussed [11-13]. 

Theoretically, it should be possible to recover the fetal 
genome non-invasively through maternal plasma sequen- 
cing to enable the comprehensive prenatal diagnosis of 
Mendelian diseases and lessen the need for invasive proce- 
dures [14,15]. In 2010, Lo's group showed the feasibility of 
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non-invasive fetal whole genome recovery and inferring 
the fetal genotype, although they did not assess the bipar- 
entally heterozygous sites in the fetal genome [14]. Recent 
studies from Kitzman et al [15] and Fan et al [16] intro- 
duced accurate non-invasive fetal genotype inference 
methods assisted by maternal haplotype, but their meth- 
ods showed uncertain performance in detecting paternal 
transition in low cff-DNA concentrations. In early gesta- 
tion, the concentration of cff-DNA is approximately 3% to 
6% of the total cell-free DNA [17], which may lead to 
uneven recovery of the paternal allele in the whole gen- 
ome. Robust strategies of noninvasively detecting both 
maternal and paternal alleles are still needed. Moreover, 
the fetal haplotype information is especially useful in 
detecting some haplotype-related diseases, such as sys- 
temic lupus erythematosus [18], as well as personal geno- 
mic analyses in the future [19,20]. 

Here, we developed a novel strategy of fetal genome 
recovery, inferring the fetal genotype as well as haplo- 
type at the same time. Given the fact that the fetal gen- 
ome is the combination of parentally transmitted 
chromosomes, we reconstructed the fetal genome by 
observing parental allele transition in maternal plasma. 
We first used a combined strategy of trios and unrelated 
individuals to construct parental haplotypes, and then 
observed the parental allele transition in maternal 
plasma and optimized the fetal haplotype using a hidden 
Markov model (HMM) and Viterbi algorithm. Thereby, 
we recovered the fetal haplotype as well as the genotype 
against all parental heterozygosity in one step. Our 
method highlights the prospective value to translational 
medicine of non-invasive prenatal diagnosis to recover 
the fetal genome using maternal plasma sequencing. 

Methods 

Sample preparation 

In this study, a Chinese couple and both parents of the 
couple were recruited with written informed consent. 
Also, this study was approved by the institutional review 
board of BGI-Shenzhen and conducted in accordance 
with the Declaration of Helsinki. 
Peripheral blood 

We collected 10 mL of peripheral blood from a woman 
with pregnancy of 13 weeks of gestation, 5 mL of per- 
ipheral blood from her husband, and 10 mL of fetal 
umbilical blood after the delivery. Blood samples from 
each participant were collected in EDTA-containing 
tubes. 

Maternal plasma 

We obtained maternal plasma from 10 mL maternal per- 
ipheral blood after centrifugation at 1,600 g for 10 min. 
Great care was taken in the collection of plasma samples 
to avoid taking the buffy coat or any blood clots. Plasma 



was transferred to 2.0 mL eppendorf tubes and centri- 
fuged at 16,000 g for 10 min to remove residual cells. 
Blood samples and plasma samples were stored at -20°C 
and -80°C, respectively, until further processing. 
Saliva 

Saliva was collected from grandparents using Oragene ® 
OG-250 tubes and kits, following the standard manufac- 
turer's instructions. 

DNA extraction 

g-DNA from whole blood, saliva, and maternal plasma 
were extracted by using a TIANamp Micro DNA Kit 
(Tiangen) according to the manufacturer's instructions. 

Library preparation and massively parallel genomic 

sequencing 

Genomic DNA 

One microgram of g-DNA was sheared by an S2 sonica- 
tor (Covaris, Inc.), yielding fragments between 100 and 
500 bp, with a predominance of 300 bp. For massively 
parallel genomic sequencing, approximately 1 \ig of frag- 
mented g-DNA was prepared for library construction. 
Briefly, DNA fragments were blunt-ended using T4 
DNA polymerase (Enzymatics), Klenow polymerase 
(Enzymatics), and T4 polynucleotide kinase (Enzymatics) 
and were ligated to adapters after addition of terminal A 
nucleotides. The adapter-ligated DNA fragments in the 
range of 300 to 350 bp were size-selected using 2% agar- 
ose electrophoresis and then amplified using a 10-cycle 
PCR. An Agencourt AMPure 450 mL Kit was used for 
the purification of PCR products. 
Plasma DNA 

Plasma DNA (10 to 50 ng) was used for library prepara- 
tion according to a modified protocol, in which a 17- 
cycle PCR was conducted to enrich adapter-ligated 
DNA fragments. 
Library QC and sequencing 

The libraries were quality-controlled by using an Agilent 
DNA 1000 kit on the 2100 Bioanalyzer (Agilent) plat- 
form and quantified by real-time PCR. DNA libraries 
were hybridized to the surface of sequencing flowcells, 
and DNA clusters were generated after amplification. 
The libraries were then sequenced using the Illumina 
Hiseq™ 2000 sequencing system according to the man- 
ufacturer's instructions. The sequence reads of this par- 
ent-offspring trio have been uploaded to the NCBI SRA 
database (SRA060043). 

Illumina DNA microarray 

The construction of the library and scanning of the 
microarray (Omni 2.5 SNP-array) were done according 
to the manufacturer's instructions for the corresponding 
array and for Iscan. 
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Bioinformatics 

Bioinformatic analyses are described in the Additional 
file 1 (Additional file 1, Supplementary Methods). 

Results 

Accurate fetal genome recovery through maternal plasma 

To perform haplotype-assisted accurate non-invasive 
fetal whole genome recovery through maternal plasma 
sequencing (Figure 1), we recruited a Chinese woman 
with pregnancy of 13 weeks of gestation and her family, 
including three generations, as well as fetal blood after 
delivery. We then performed approximately 44X and 
20X whole genome shotgun sequencing of the plasma 
sample and of parental genomic DNA (g-DNA), respec- 
tively (Table 1). The cff-DNA concentration of this male 
fetus was estimated as 5.69% using the biparentally 
homozygous sites. Illumina Infinium HD Human610- 
Quad BeadChip was used to genotype the gDNA from 
grandparents to construct the parental haplotypes. Also, 
we used Illumina HumanOmni2.5-8 BeadChip to vali- 
date the accuracy of parental SNP calling, in which the 
parental genotypes were validated as approximately 
99.22% consistent with the array (Table 1 and Addi- 
tional file 1, Table SI). 

We then performed a parental haplotype construction 
with a combined strategy of trios and unrelated indivi- 
duals. Although both trios and unrelated individuals could 
be applied to construct the parental haplotypes, the haplo- 
type ambiguity in trio strategy and the stratification in 
unrelated individual strategy would significantly restrict 
the value of either of these strategies. In this study, the 
parental haplotypes were obtained by BEAGLE [21] using 
their sequencing genotype and the genotyping data of the 



grandparents along with the newly released 51 parent-off- 
spring trios of Chinese Han in the 1000 Genomes project 
(pilot II). By using this strategy, the inferred rate of paren- 
tal haplotypes increased, on average, from 90.32% to 100% 
compared to using a trio strategy only. 

Assisted by the parental haplotypes, we then devel- 
oped an efficient method for fetal whole genome recov- 
ery through maternal plasma sequencing. Ideally, in 
maternal plasma sequencing with a site-by-site strategy 
(SBSS), we could reconstruct the paternally transmitted 
allele directly by determining the nucleotide sequence of 
the paternal-specific allele at paternal-only heterozygous 
sites and determine the maternal transition by observing 
allelic imbalance at individual sites. However, the appli- 
cation of this simple idea could be hinderedby low cff- 
DNA concentration and sequence depth. In our plasma 
sequencing, approximately 57.84% of the paternal-speci- 
fic alleles were totally absent (Figure S3). Additionally, 
our estimation of the concentrations of three different 
alleles in plasma showed that 24,938 of 137,567 
(25.40%) sites showed an opposite allelic imbalance 
(Additional file 1, Supplementary Materials). These 
results indicate the infeasibility of SBSS in samples with 
low cff-DNA concentration and sequence depth. Thus, 
we introduced a sensitive HMM to identify the paren- 
tally transmitted allele and recombination breakpoints 
(Figure 2 and Additional file 1, Supplementary Meth- 
ods), in which we predicted the fetal haplotype on the 
paternal-only, maternal-only, and biparentally heterozy- 
gous sites in one step. With the use of the HMM and 
Viterbi algorithm, the fetal haplotypes of the 374,980 
markers (including chromosome X) were recovered suc- 
cessfully (Table 2). 
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Figure 1 The research principle of our study. To recover the fetal genome, we divided our work into several parts. We first recruited a family 
that included three entire generations. The parental genotypes were determined by whole genome sequencing, whereas the grandparents' 
were determined by SNP array. We then constructed parental haplotypes with a combined trio and unrelated-individual strategy. Assisted by the 
parental haplotypes, we successfully recovered the fetal genome via maternal plasma DNA sequencing. Finally, we performed a validation using 
the child's cord blood after the delivery. 
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Table 1 Data production 



Microarray array 


Samples 


Type of DNA 


Type of microarray 


Call rate (%) 




SNP calling (n) (10 5 ) 


Grandparents 3 


g-DNA (saliva) 


Human 610-Quad BeadChip 


99.70 ± 0.07 




5.89 ± 0.004 




WGS 


Samples 


Type of DNA 


Reads (n) 
(10 9 ) 


Production 
(Gb) 


Map rate 

(%) 


Coverage 

(%) 


Depth 
(fold) 


Consistency in validation 

(%) 


Father 


g-DNA (blood) 


0.72 


71.89 


89.75 


99.71 


21.86 


99.23 


Mother 


g-DNA (blood) 


0.74 


74.03 


90.19 


99.09 


20.96 


99.19 


Offspring 


g-DNA (cord 
blood) 


0.72 


72.17 


90.64 


99.75 


21.32 


99.25 


Plasma 


Plasma DNA 


1.81 


179.63 


83.68 


99.47 


43.91 





a Mean ± standard deviation. 



Accuracy of the recovered fetal haplotype 

We performed a final validation to estimate the overall 
accuracy of the predicted fetal haplotype. To assess the 
standard fetal haplotype, we also performed a whole 
genome sequencing of the cord blood obtained after the 
child's birth to approximately 20-fold coverage (Table 1 
and Additional file 1, Table SI). The genotypes of the 



child were determined using SOAPsnp and were vali- 
dated at 99.25% consistency with his genotyping of 
HumanOmni2.5-8 BeadChip. The standard haplotype of 
the child was inferred by the same method as used for 
his parents. Finally, the general accuracy of the paternal 
alleles and maternal alleles were estimated by comparing 
the recovered fetal genome with the standard haplotype 
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Figure 2 Identification of recombination breakpoints by HMM. This figure shows the HMM-based detection of recombination and the 
predicted fetal haplotype. A genomic region from on Chr3 (120-150 Mb) is shown with lines (red for paternal allele, blue for maternal allele) 
indicating the logarithmic odds ratio between transmission probability of haplotype 1 and haplotype 0, which were computed by the HMM at 
each site. The color-coded chart (top) shows the predicted fetal haplotype as a combination of parental alleles. 
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Table 2 The general accuracy of haplotype prediction 



Category 




Paternal allele 


Maternal allele 








Autosome 


Autosome 


ChrX 


Consistent with g-DNA from cord blood 


Loci (n, percentage) 


105,729 
(98.57%) 


103,082 
(95.37%) 


1,902 
(98.45%) 


Inconsistent with g-DNA from cord 
blood 


Loci (n, percentage) 


1,529 (1.43%) 


5,005 (4.63%) 


30 (1.55%) 




Type 1 (noisy from haplotype inference) 


1,458 (95.36%) 


1,442 (28.81%) 






Type II (recombination breakpoint related) 


71 (4.64%) 


3,295 (65.83%) 


24 (80.00%) 




Type III (centromere or chromosome edge 
related) 


0 (0%) 


268 (5.35%) 


6 (20.00%) 


Total 




107,258 


108,087 


1,932 



of the child (Table 2). For the recovered paternal alleles, 
105,729 loci of our recovery were consistent with the 
standard haplotype, indicating a high accuracy of 
98.57%. For the recovered maternal autosomal alleles, 
103,082 loci were consistent with the standard haplo- 
type, for a slightly lower accuracy of maternal allele 
recovery of 95.37%. The maternal allele recovery on the 
chromosome X showed an accuracy of 98.45% (1,902/ 
1,932 loci). 

We further classified the recovery errors into different 
types (Table 2). Type I errors, which were randomly dis- 
tributed throughout the whole genome, explained 
95.36% and 28.81% of paternal and maternal recovery 
inaccuracies, respectively. We assume that type I errors 
were caused by the haplotype ambiguity during the par- 
ental or standard fetal haplotype inference. The type II 
errors, which mostly clustered next to the recovered 
recombination breakpoints, were most probably caused 
by the inaccuracy of recombination breakpoint recovery. 
This type of error explained the remaining 4.64% of the 
paternal allele recovery inaccuracies, 65.83% of the 
maternal autosome allele recovery mistakes, and 80.00% 
of chromosome X recovery mistakes, indicating difficul- 
ties in maternal recombination breakpoint determina- 
tion. The rest, referred to as type III errors, were related 
to heterochromatin close to the centromeres or chromo- 
some ends. Type III errors explained 5.35% of maternal 
autosome and 20.00% of chromosome X maternal allele 
recovery errors (Additional file 1, Table S5). 

To estimate the correlation between sequencing data 
and the detection accuracy of the recovered fetal genome, 
we sampled a subset of data from the maternal plasma 
sequencing (Figure 3). Generally, the accuracy of the 
recovered fetal genome increased with the depth of 
maternal plasma sequencing. Because of the existence of 
type I errors, the accuracy began to stabilize when the 
sequence depth grew >20X. Additionally, the accuracy of 
the paternal-only heterozygous sites indicated the robust- 
ness of our method for paternal allele recovery in low eff- 
DNA concentrations among different sequence depths. 



For example, using only 6% of the plasma sequence data 
(non-duplicate approximately 2.01X), we successfully 
recovered 97.61% of the maternal-only heterozygous 
sites. 

In summary, 98.57% of the paternal alleles, 95.37% of 
the maternal autosomes and 98.45% of the chromosome 
X were recovered precisely using approximately 43.91X 
(non-duplicate approximately 33.60X) maternal plasma 
sequencing with a 5.69% cff-DNA concentration. The 
quality of haplotype interference, the accuracy of the 
recombination breakpoint prediction, and heterochroma- 
tin close to centromeres and chromosome ends explained 
most of our recovery errors. The simulations suggested 
the robustness of our method at lower sequence depth 
and low cff-DNA concentration, especially for paternal 
allele recovery. 

The application of non-invasive fetal genomics 

Inheritable genetic disease screening and Mendelian 
character predictions are two important applications of 
accurate fetal genome recovery. So far, 7,895 pathogenic 
genes related to Mendelian diseases have been released 
by OMIM (Online Mendelian Inheritance in Man [22]), 
96.65% of which were directly covered by or strongly 
linked with our 374,980 effective marker loci. In the 
case of complex diseases, 98.90% of 6,939 disease-asso- 
ciated loci from the NHGRI GWAS Catalog [23] were 
directly covered by or strongly linked with our marker 
loci. Interestingly, a TC genotype at rsl7822931 (Chrl6: 
46,815,699; predicted accurately) is consistent with the 
offspring having earwax of the wet type [24], which is 
not typical in Asians and Native Americans [25]. The 
level of throughput of disease and trait screening implies 
that noninvasive prenatal diagnosis/screening can have 
high detection efficiency. The strategy of using three 
generations of a family increased the robustness of 
detecting rare mutations, showing a similar performance 
between common mutations and rare mutations (Addi- 
tional file 1, Figure S4). Moreover, based on our accu- 
rate recovered haplotype, heritable complex variations 
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Average Sequence depth (Nonduplicate) 

Figure 3 The relationship between accuracy and sequence depth. The color-coded curves denote statistics at different kinds of sites (blue: 
autosome, maternal-only heterozygous sites; red: autosome, paternal-only heterozygous sites; green: autosome, biparentally heterozygous sites; 
orange: ChrX, maternal heterozygous sites). 



(such as long insertions/deletions, translocations, rear- 
rangements, and even disease-related methylation), 
unlike SNPs, which were hard to observe directly in 
maternal plasma, could be mapped to the fetal genome 
by their linkage disequilibrium relationships using other 
existing techniques. 

Discussion 

Here we report a haplotype-assisted approach for non- 
invasive fetal whole genome recovery. The characteristics 
of our method and two previously reported haplotype- 
assisted methods [15,16] are summarized in Table 3. We 
further assessed the fetal genotype accuracy to compare 
the practical performance in corresponding cases to 
reach comprehensive conclusions (Table 4). 

Three fetal genome recovery strategies have employed 
parental haplotypes, but with different inference strate- 
gies. We used a common genetics approach to determine 
the parental haplotypes by using the genotyping data of 
surviving grandparents or born offspring. This approach 
provided a practical strategy for non-invasive fetal gen- 
ome screening for families with probands, especially for 
families with born offspring with Mendelian diseases. 
However, this specific sample recruitment would restrict 
its prospects for clinical application. To overcome this 
limitation, Kitzman et al. [26] and Fan et al. [27] have 



constructed maternal haplotypes directly using noninva- 
sive experimental approaches. However, the time and 
resource consumption of their complex experimental 
methods would restrict their clinical application. For 
example, it would take approximately 8 days and another 
US$3,678 to prepare the fosmid clone library [26]. 

Second, these three strategies all used the maternal 
haplotype, but we used the paternal haplotype. For pater- 
nal-only heterozygous sites, Kitzman et al. performed 
SBSS to detect the paternal-specific allele. In SBSS, one 
or more reads matching the paternal-specific allele are 
taken as evidence of its transition. However, the perfor- 
mance of SBSS depends on cff-DNA concentration and 
sequence depth. For example, 96.80% of alleles in the trio 
labeled II in the Kitzman et al. study (WGS approxi- 
mately 78X, cff-DNA concentration approximately 13%) 
were predicted correctly. However, in the case of low cff- 
DNA concentration, such as trio Gl (WGS approxi- 
mately 56X, at 8.14 weeks) reported by Kitzman et al., 
only 60.3% of paternal-specific alleles were identified cor- 
rectly. Therefore, Fan et al. imputed the paternal allele 
using data from the 1000 Genomes project. In total, 
approximately 70% of the paternally transmitted alleles 
were reconstructed with an accuracy of 93% to 97%. This 
implied the imperfect efficiency and eurytopicity of SBSS, 
even with population-scale sequencing for imputation. 
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Table 3 Comparison of fetal genome recovery methods 



Category 



Current study 



Kitzmanef al. 



Fan et al. 



Parental haplotype inference 

Method for Trio strategy with corresponding grandparents 

parental and CHS 

haplotype 

construction 

Strategy for fetal genome recovery 

Paternalallele Two different alleles of fetal haplotype, 
transmitted from the two parents, were 
reconstructed by a HMM model in one step, 
including transmitted chromosomes and 
recombination breakpoints 

Maternal allele 



Recovery of fetal genome 

Genotype Yes 

Haplotype Yes 

De-novo No 
mutation 



Maternal: fosmid-based approach [26] 
Paternal: could not be assessed due to 
lowmolecular weight of saliva DNA 



SBSS 



For maternal-only heterozygous sites, they 
used AlEto determine whole-block transitions 
and HMM to identify assembly errors and 
recombination breakpoints. For biparentally 
heterozygous sites, maternal alleles were 
determined by maternal-only heterozygous 
sites within the same block 



Yes 
No 
Yes 



Maternal: single-cell approach 
[27] 

Paternal: not collected 



SBSS+ imputation 



Allele imbalance estimated by 
counting nucleotides specific 
to each of the two maternal 
alleles 



Yes 
No 
No 



The dependence on cff-DNA concentration and 
sequence depth also limits the application of SBSS to 
early gestation. Therefore, it is an advisable strategy to 
use the paternal haplotype for noninvasive fetal whole 
genome recovery, especially for Mendelian disease 
diagnosis. 



Third, the results of the fetal genome recovery were 
different between the three studies. Kitzman et al. and 
Fan et al. focused on fetal genotype inference. We tried 
to recover the fetal haplotype and genotype because the 
haplotype information is important for complex diseases 
screening, such as systemic lupus erythematosus [18]. 



Table 4 Practical performance comparison between fetal genome recovery methods 



Category 


Current study 


Kitzman et al. 


Fan et al. 










Trio 11 


Trio G1 


P1T1 


P1T2 


P2T3 


Gestational week 


13 


18.5 


8.14 


9 


29 


39 


Estimated average cff-DNA concentration 


5.69% 


13% a 


6% a 


6% a 


16% a 


30% a 


Average sequence depth (fold) 


43.91 


78 a,b 


56 a,b 


52.7 


20.8 


10.7 


Fetal gender 


Male 


Male 




Female 


Female 


Female 


Maternal allele 














Predicted rate 


100% 


91.4% 




>99.2% 






Predictionaccuracy 


95.37% (autosome) 






>99.8% 








98.45% (ChrX) 












Paternal allele 














Predicted rate 


100% 






71.60% 


72.84% 


72.94% 


Prediction accuracy 


98.57% 






93.79% 


95.84% 


96.56% 


Accuracy of inferred fetal genotype 














Autosome Paternal-only heterozygous 


99.12%, n = 65,409 


96.8% 


60.3% 








Maternal-only heterozygous 


95.84%, n = 66,238 


99.3% c 


95.7% c 








Biparentally heterozygous 


94.90%, n = 41,849 


98.7% d 


91.3% d 








ChrX Maternal-only heterozygous 


98.45%, n = 1,932 













- = No data 

Approximate. 

b Non-duplicate. 

Estimated based on maternal phased sites. 

d Accuracy of transmitted maternal allele prediction, estimated based on maternal phased sites. 
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Besides helping in haplotype-related disease detection, 
accurate fetal haplotype prediction might be helpful to 
identify fetal de-novo copy number variations, such as 
aneuploidy or even microdeletion and microduplication 
syndromes (Additional file 1, Supplementary Materials). 

Fourth, only Kitzman et al. performed a fetal de-novo 
mutation identification. These mutations were expected 
to appear within the maternal plasma as rare alleles, like 
paternal-specific alleles [15]. Ideally, SBSS can identify 
the fetal de-novo mutations easily. Therefore, Kitzman 
et al. achieved 88.60% sensitivity of high-confidence 
fetal de-novo mutations. However, the systematic error, 
which was dominated by errors originating during poly- 
merase chain reaction (PCR), introduced hundreds of 
noisy signals. Even with stringent filters, >99% of the 
candidate sites were false-positive, implying a specificity 
of approximately 0.84% in trio II. Moreover, as men- 
tioned above, the sensitivity of SBSS depends on eff- 
DNA concentration and sequence depth. In our case, 
only 68.97% of the high-confidence fetal de-novo muta- 
tions could be identified (Additional file 1, Table S4). 
Hence, effective algorithms are still required for fetal de- 
novo mutation identification. 

In conclusion, all three of these methods provide pro- 
mising solutions for non-invasive fetal genome recovery, 
with different strengths and weaknesses. The performance 
of paternal allele recovery indicated the requirement of the 
paternal haplotype, especially for non-invasive Mendelian 
disease detection. Therefore, it is wise to choose a suitable 
approach to obtain the parental haplotype based on the 
clinical reality, and our data show that a strategy with 
additional relative samples would be an alternative 
method. Our strategy of parental haplotype inference pro- 
vided a practical solution to detect fetal Mendelian dis- 
eases non-invasively, especially for couples with a born 
proband. 

Currently, it costs US$41 to generate a single gigabyte 
of sequence data with the Illumina HiSeq 2000 platform 
[28]; therefore, it would cost at least US$14,000 to gener- 
ate the sequence data in this study. With developing 
technology, the price of the sequencing will drop to US 
$1,000 per genome in the foreseeable future [29]. Conse- 
quently, sequence-based approaches will become practi- 
cal for non-invasive fetal whole genome recovery. 
Regarding Mendelian diseases, combining our method 
with exome sequencing technology, it will cost only US 
$1,200-1,400 for each family with a born proband 
(including 30X exome sequence coverage for the couple, 
the born proband, and the maternal plasma), which is 
affordable for many families. Moreover, the developing 
sequence platforms with shorter turnaround times will 
significantly broaden the application of sequence-based 
approaches for fetal genome recovery. For example, the 
MiSeq platform takes <48 h for PE 150 sequencing, 



meaning pregnant women could receive their results 
within 1 week. Thus, the advantage of short turnaround 
times makes us confident that sequence-based 
approaches for fetal genome recovery will play an 
increasingly important role in the future. 

The development of non-invasive measures for fetal 
genome recovery will surely bring new insight to prenatal 
genetic diagnosis. In the case of fetal Mendelian disease 
identification, sequence-based approaches will provide 
fast and reliable options to pregnant women, reducing 
the use of unnecessary invasive procedures. Comprehen- 
sive fetal genome sequencing with high accuracy not only 
enables us to make definitive diagnoses but also provides 
potential applications in personal medicine, such as iden- 
tification of allergens [30]. In addition, the easy sampling 
of sequence-based approaches shows eurytopicity for 
gestational stage, which may be helpful to make appro- 
priate clinical decisions. For instance, a fetus diagnosed 
with phenylketonuria in the third trimester would benefit 
from treatment immediately after delivery [16]. However, 
the increase in information available to parents will raise 
ethical questions. For example, in most cases, the influ- 
ence of a novel fetal mutation is hard to predict. Should a 
woman be informed if her fetus has a novel mutation of 
unpredictable consequence? The uncertainty of these 
mutations may increase the unnecessary anxiety of preg- 
nant women; however, the lack of such information 
would lead to improper decisions. Thus, the key concern 
is what kind of information would/should be reported, 
and this question should be thoroughly discussed within 
the scientific community and on a societal level. 

There are still several limitations of our approach ham- 
pering further clinical application. First, the use of com- 
mercial microarrays (grandparents or the CHS trios) 
greatly restricted our study of common SNPs. Therefore, 
only a small fraction of SNPs were discussed in our study, 
and we ignored most of the rare variations. In addition, 
short indels, which could not be located in the parental 
haplotypes because of the lack of grandparental informa- 
tion, were excluded from our analysis. Short indels not 
only play an important role in Mendelian disease [31] but 
also show strong power as markers [32], At present, target 
sequences with abundant tag-SNPs are advisable for future 
studies. Second, our study focused on mutations at the 
DNA level, which excluded most of the haplotype-asso- 
ciated transgenerational epigenetic modifications [33]. The 
clinical application of fetal genome recovery will require 
more robust experimental breakthroughs and algorithms 
to explore the comprehensiveness of the genome coverage. 
Third, although we successfully recovered a fetal genome 
in a case of a singleton pregnancy, accurate genome recov- 
ery in cases of twin pregnancy is still unattainable. 
Currently, the sequence-based approach for non-invasive 
prenatal diagnosis in twin pregnancies is restricted to 
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aneuploidy [34]. The recovery of twins' genomes will 
greatly broaden the horizon of non-invasive prenatal 
diagnosis. 

Conclusions 

In this study, we introduced an accurate method for 
fetal genome recovery in one step using maternal 
plasma sequencing. More than 95% of the fetal geno- 
types were inferred successfully, and most importantly, 
>95% of the fetal haplotypes were recovered precisely. 
As a proof of concept, we propose the clinical applica- 
tion of the recovered genome to non-invasive prenatal 
diagnosis/screening. In summary, we report an accurate 
and easy method for non-invasive fetal whole genome 
recovery by maternal plasma sequencing. An accurate 
fetal haplotype would enhance the dimensionality of 
fetal variation detection in prenatal diagnosis/screening 
and promote the development of fetal medicine. Our 
results indicate the potential of using sequencing tech- 
nology in prenatal diagnosis, and they should accelerate 
the application of sequencing technology in clinical 
trials. 
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